-Basic submission stats -Rescoring the results using reticulate connection to scoring harness python script -Plotting the rescored results to ensure they are in line with final leaderboard -Bootstrap evaluation of the predictions - resampling with replacement - to assess stability of results for all six metrics.
table(leaderboard$userName) %>% as.data.frame() %>% arrange(-Freq) %>% count(Freq)
## # A tibble: 3 x 2
## Freq n
## <int> <int>
## 1 1 20
## 2 2 24
## 3 3 34
library(DT)
DT::datatable(table(leaderboard$userName) %>% as.data.frame() %>% arrange(-Freq))
###Histograms of scores
ggplot(leaderboard)+
geom_histogram(aes(x=pearson), binwidth = 0.01)
ggplot(leaderboard)+
geom_histogram(aes(x=spearman), binwidth = 0.01)
## Warning: Removed 3 rows containing non-finite values (stat_bin).
ggplot(leaderboard)+
geom_histogram(aes(x=log(rmse)), binwidth = 0.01)
ggplot(leaderboard)+
geom_histogram(aes(x=ci), binwidth = 0.01)
ggplot(leaderboard)+
geom_histogram(aes(x=f1), binwidth = 0.01)
ggplot(leaderboard)+
geom_histogram(aes(x=average_AUC), binwidth = 0.01)
To bootstrap a given prediction file this script: randomly samples 430 times from prediction file. Compute the 6 metrics for those random predictoins Repeat 20x per prediction file to generate a distribution of bootstrapped scores per prediction file. I then plotted the top 20 predictions for each metric using the leaderboard value, superimposed on the distribution of the bootstrapped prediction. Bars are ranked best to worst performer (based on single leaderboard value). Diamonds are the actual leaderboard value.